[Datasets] Support tensor columns in `to_tf` and `to_torch`. #24752

clarkzinzow · 2022-05-12T23:38:05Z

This PR adds support for tensor columns in the to_tf() and to_torch() APIs.

For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor.

For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series.

In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR.

Related issue number

Closes #18314, #18315

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

amogkam

Nice! Left some comments, mostly for my understanding!

python/ray/data/dataset.py

amogkam · 2022-05-13T00:03:58Z

python/ray/data/tests/test_dataset.py

    )
    df = pd.concat([df1, df2])
    ds = ray.data.from_pandas([df1, df2])
    ds = maybe_pipeline(ds, pipelined)
    torchd = ds.to_torch(label_column="label", batch_size=2)

-    num_epochs = 2
+    num_epochs = 1 if pipelined else 2


Does pipelined case not work with 2 epochs?

You can't consume a pipeline multiple times, this test never worked with num_epochs = 2 and pipelining. See test_to_torch for this same pattern.

python/ray/data/tests/test_dataset.py

ericl · 2022-05-13T00:16:46Z

doc/source/data/dataset-tensor-support.rst

@@ -246,5 +246,3 @@ This feature currently comes with a few known limitations that we are either act

 * All tensors in a tensor column currently must be the same shape. Please let us know if you require heterogeneous tensor shape for your tensor column! Tracking issue is `here <https://github.com/ray-project/ray/issues/18316>`__.
 * Automatic casting via specifying an override Arrow schema when reading Parquet is blocked by Arrow supporting custom ExtensionType casting kernels. See `issue <https://issues.apache.org/jira/browse/ARROW-5890>`__. An explicit ``tensor_column_schema`` parameter has been added for :func:`read_parquet() <ray.data.read_api.read_parquet>` as a stopgap solution.
- * Ingesting tables with tensor columns into pytorch via ``ds.to_torch()`` is blocked by pytorch supporting tensor creation from objects that implement the `__array__` interface. See `issue <https://github.com/pytorch/pytorch/issues/51156>`__. Workarounds are being `investigated <https://github.com/ray-project/ray/issues/18314>`__.


👍 doc updates. Could we also add some explicit examples / text in the docs on how to do this?

Hmm document how to do what? This should now transparently work with tensor columns without any considerations by the user, so there wouldn't be anything special to document at exchange time.

It's still not clear to me, and someone only passably familiar with the tensor code, how exactly I would use it. Could we have a runnable example showing it end to end with a tensor dataset?

Basically whatever's in the unit test, but in a more friendly example form.

Agreed on having an e2e example for tensor data being a good idea, and I'm about to work on creating an e2e example for tensor data as part of the Datasets GA docs work in a separate PR, but the point that I'm making here is that the user shouldn't have to do anything differently in the .to_torch() or .to_tf() call just because their data contains tensor columns.

The tensor column --> ML framework tensor conversion will happen automatically, without any tensor-column-specific args to .to_torch() or .to_tf(), or special considerations for the upstream tensor column creation other than what's already documented in the tensor column guide: creating the tensor column in the first place. For .to_tf(), the tensor spec of the columns needs to be given, but that is already a requirement for all columns.

Since this example work is already planned for before GA, could we merge this in order to unblock the user PoC that depends on this?

I don't think the example I'm asking for is more in length than this thread already. Let's set a high standard for documentation.

I pushed up an example so we can resolve this. I'll have to rewrite/delete this example early next week once other work is merged, which is one of the reasons that I didn't want to block the user PoC on this example, so I'd like it if we could be a bit more pragmatic in the future.

Thanks. I don't see any harm in blocking this PR on the e2e example, in fact. We should be treating docs like unit tests, and not a separate "optional" component. There shouldn't really be a separate e2e example effort, for example.

I left a separate comment on making the example here more fleshed out (maybe this is copying work from your other effort).

amogkam · 2022-05-13T00:28:49Z

python/ray/data/dataset.py

+                    isinstance(dtype, pd.api.extensions.ExtensionDtype)
+                    for dtype in df.dtypes
+                ):
+                    values = np.stack([col.to_numpy() for _, col in df.items()], axis=1)


hmm do we want to conver all columns to numpy if any column is an ExtensionDtype.

Or should we only do col.to_numpy() only if the dtype of that particular columns is an ExtensionDtype

It doesn't matter, df.values converts all columns to NumPy ndarrays anyway so it would be exactly the same.

ericl · 2022-05-13T23:08:03Z

doc/source/data/dataset-tensor-support.rst

+  for the tensor columns, including the shape of each underlying tensor element in said
+  column.
+
+.. code-block:: python


This isn't really e2e--- could we start with a example dataset generated via ray.data.range_tensor() (maybe using add_column to add extra cols), and follow with DataLoader to read the data after?

ericl

Approved but I'd prefer we always ship with e2e examples.

richardliaw · 2022-05-13T23:50:36Z

BTW, just to add some color about end-to-end examples.

I think there's a bit of a miscommunication; let me try to elaborate on what @ericl is trying to say.

Clark, from what I understand, is that you are going to build an E2E example very soon, and Eric is seemingly asking for redundant work.

I think what Eric is asking about is more of a runnable example with more context in the docstrings.

From the user perspective, I think there's two things that is hard to understand about the current snippets on the tensor user guide:

If I create a TF dataset from a tensor dataset, what can I do with it? As a user, what type of inputs will go in, and what type of outputs will I see now?
More broadly, very often, users are going to access docs in random fashion. The current code snippet can be largely improved by just adding a bit of context. Otherwise, as a user, it's the most frustrating thing when you look at docs and you find some random esoteric minimal snippet, and you have no idea what’s it supposed to do and need to piece together a bunch of parts.

(This is a bit of a departure from the current way of writing docs, I know. But I think we can start upping the standards, especially with our new focus on building out great docs. Also, these esoteric minimal snippets are a recent major pain point i've had with Serve and am a bit scarred from that experience ;) )

clarkzinzow · 2022-05-14T00:58:49Z

@ericl @richardliaw I completely agree that this should be the desired end-state! What I'm pushing back against is expanding the scope of this narrow PR to include docs work that is already in progress in other branches/PRs and has already been separately allocated.

To be clear, the data team is already working on each of the items y'all have mentioned, just not in this PR.

I'm already working on an e2e tensor example (notebook) in another branch/PR.
Most of this tensor data feature guide is going to be rewritten and/or split into the other feature guides (e.g. specifics of reading tensor data is probably going to go into the "Creating Datasets" feature guide), which is why I'm not wanting to modify the tensor data feature guide in this PR.
The API docs of to_tf() and to_torch() are going to be overhauled with small e2e examples.
We're creating an ML framework integration feature guide that will hold e2e examples and any quirks around handling tensor columns (if there are still any at that point, some WIP work should eliminate some of those existing quirks).

I'd rather have discussions on those e2e examples in those upcoming PRs than in this PR, which is narrowly focused on unblocking basic use of the ML framework integrations with tensor data.

richardliaw · 2022-05-14T08:22:24Z

What I'm pushing back against is expanding the scope of this narrow PR to include docs work that is already in progress in other branches/PRs and has already been separately allocated.

I think the confusion lies in exactly how much expanded scope we're talking about here (the word "end-to-end" may have a conflated connotations). You'll make me happy with just a small 5-10 line addendum (which I tried to do below, btw, but i really couldn't get it to work):

diff --git a/doc/source/data/dataset-tensor-support.rst b/doc/source/data/dataset-tensor-support.rst
index 6c6baea07..a268b4cd6 100644
--- a/doc/source/data/dataset-tensor-support.rst
+++ b/doc/source/data/dataset-tensor-support.rst
@@ -161,12 +161,22 @@ automatically converted to a Torch/TensorFlow tensor without incurring any copie

 .. code-block:: python

+  import ray
+  import torch
+
+  ds = ray.data.range_tensor(10000, shape=(3, 5))
+
   # Convert the dataset to a Torch IterableDataset.
   torch_ds = ds.to_torch(
       label_column="label",
       batch_size=4,
       unsqueeze_label_tensor=False,
   )
+  dataloader = torch.utils.data.DataLoader(torch_ds, num_workers=0)
+
+  for x, y in dataloader:
+    # train model(x, y)
+

   # Convert the dataset to a TensorFlow Dataset.
   tensor_element_shape = (32, 32)

richardliaw · 2022-05-14T08:31:47Z

The logic from my perspective is the following.

A feature PR should be self-contained with the following things:

Correct feature code
Tests that verify and maintain the correctness of the functionality of the feature code.
Examples or docs or docstrings that allow people to use the feature successfully.

Obviously there's different degrees of investment for 1, 2, and 3.
Like it'd be crazy to ask for a release test in every feature PR.
And it'd be crazy to ask for a comprehensive e2e example for every feature PR.

But, on the other hand, you still need comprehensive unit tests to verify correctness (which you've done a great job of).
and, you need basic documentation to allow the proficient (or basic) user to be successful (<-- this is the point of contention).

This PR does not provide enough documentation for even the proficient user to use it successfully (i tried and failed...).

clarkzinzow · 2022-05-14T17:23:40Z

@richardliaw Again, agreed on those standards for PRs; the data team is in the middle of a massive docs push in order to create complete docs coverage at that standard, and we'll be strictly adhering to that standard post-GA.

In order to unblock this PR and resolve this discussion, I'll add an e2e example to the tensor feature guide.

But in case the motivation I gave above was not clear, I was trying to avoid adding this e2e example to the tensor feature guide since this will end up being redundant work that will be obviated in the next day or two:

Today/tomorrow, I'm going to push up code changes to our tensor data handling and to_torch()/to_tf() that will eliminate the need for an example specifically for tensor data; all tensor data quirks will ideally be removed.
Today/tomorrow, I'm going to push up an ML framework exchange feature guide that will house the e2e example and any quirks around data type handling (e.g. tensor data).

Either of the parallel work in (1) or (2) will make this e2e example in the tensor data feature guide obsolete, and I was aiming to quickly get this bug fix to a user doing a PoC, so I was hoping that we could be pragmatic here.

Anyways, I'll add an e2e example to get this unblocked.

clarkzinzow · 2022-05-14T17:38:15Z

What I'm pushing back against is expanding the scope of this narrow PR to include docs work that is already in progress in other branches/PRs and has already been separately allocated.

I think the confusion lies in exactly how much expanded scope we're talking about here (the word "end-to-end" may have a conflated connotations). You'll make me happy with just a small 5-10 line addendum (which I tried to do below, btw, but i really couldn't get it to work):
diff --git a/doc/source/data/dataset-tensor-support.rst b/doc/source/data/dataset-tensor-support.rst
index 6c6baea07..a268b4cd6 100644
--- a/doc/source/data/dataset-tensor-support.rst
+++ b/doc/source/data/dataset-tensor-support.rst
@@ -161,12 +161,22 @@ automatically converted to a Torch/TensorFlow tensor without incurring any copie

 .. code-block:: python

+  import ray
+  import torch
+
+  ds = ray.data.range_tensor(10000, shape=(3, 5))
+
   # Convert the dataset to a Torch IterableDataset.
   torch_ds = ds.to_torch(
       label_column="label",
       batch_size=4,
       unsqueeze_label_tensor=False,
   )
+  dataloader = torch.utils.data.DataLoader(torch_ds, num_workers=0)
+
+  for x, y in dataloader:
+    # train model(x, y)
+

   # Convert the dataset to a TensorFlow Dataset.
   tensor_element_shape = (32, 32)

@richardliaw As for your e2e example attempt, did you check out the to_torch() docstring? You're not supposed to use the returned IterableDataset with a Torch DataLoader, you're supposed to consume it directly.

It is recommended to use the returned IterableDataset directly instead of passing it into a torch DataLoader.

You're also generating a single "value" column of tensor data with ray.data.range_tensor(), but then specifying a "label" label column and expecting there to also be feature columns.

I'm going to be reworking the to_torch() and to_tf() docstrings in a bit, so let me know if we need to make the IterableDataset use (sans DataLoader) much more clear, and whether the "only specify a label column, implicitly treat all other columns as feature columns" semantics could be clearer.

I'm guessing that an e2e example in the docstring and the feature guide will make this all much more clear, which is why we have a dedicated docs task for adding these! 😄

clarkzinzow · 2022-05-16T23:53:11Z

@ericl @richardliaw @amogkam Any final feedback on the example or otherwise? If not, I'm planning on merging this today.

This PR adds support for tensor columns in the to_tf() and to_torch() APIs. For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor. For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series. In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR.

…ject#24752) This PR adds support for tensor columns in the to_tf() and to_torch() APIs. For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor. For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series. In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR.

…A. (#25010) * [Datasets] Add `from_huggingface` for Hugging Face datasets integration (#24464) Adds a from_huggingface method to Datasets, which allows the conversion of a Hugging Face Dataset to a Ray Dataset. As a Hugging Face Dataset is backed by an Arrow table, the conversion is trivial. * Test the CSV read with column types specified (#24398) Make sure users can read csv with columns types specified. Users may want to do this because sometimes PyArrow's type inference doesn't work as intended, in which case users can step in and work around the type inference. * [Datasets] [Docs] Add a warning about from_huggingface (#24608) Adds a warning to docs about the intended use of from_huggingface. * [data] Expose `drop_last` in `to_tf` (#24666) * [data] More informative exceptions in block impl (#24665) * Add a classic yet small-sized ML dataset for demo/documentation/testing (#24592) To facilitate easy demo/documentation/testing with realistic, small-sized yet ML-familiar data. Have it as a source file with code will make it self-contained, i.e. after user "pip install" Ray, they are all set to run it. IRIS is a great fit: super classic ML dataset, simple schema, only 150 rows. * [Datasets] Add more example data. (#24795) This PR adds more example data for ongoing feature guide work. In addition to adding the new datasets, this also puts all example data under examples/data in order to separate it from the example code. * [Datasets] Add example protocol for reading canned in-package example data. (#24800) Providing easy-access datasets is table stakes for a good Getting Started UX, but even with good in-package data, it can be difficult to make these paths accessible to the user. This PR adds an "example://" protocol that will resolve passed paths directly to our canned in-package example data. * [minor] Use np.searchsorted to speed up random access dataset (#24825) * [Datasets] Change `range_arrow()` API to `range_table()` (#24704) This PR changes the ray.data.range_arrow() to ray.data.range_table(), making the Arrow representation an implementation detail. * [Datasets] Support tensor columns in `to_tf` and `to_torch`. (#24752) This PR adds support for tensor columns in the to_tf() and to_torch() APIs. For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor. For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series. In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR. * Implement random_sample() (#24492) * Map progress bar title; pretty repr for rows. (#24672) * [Datasets] [CI] fix CI of dataset test (#24883) CI test is broken by f61caa3. This PR fixes it. * [Datasets] Add explicit resource allocation option via a top-level scheduling strategy (#24438) Instead of letting Datasets implicitly use cluster resources in the margins of explicit allocations of other libraries, such as Tune, Datasets should provide an option for explicitly allocating resources for a Datasets workload for users that want to box Datasets in. This PR adds such an explicit resource allocation option, via exposing a top-level scheduling strategy on the DatasetContext with which a placement group can be given. * [Datasets] Add example of using `map_batches` to filter (#24202) The documentation says > Consider using .map_batches() for better performance (you can implement filter by dropping records). but there aren't any examples of how to do so. * [doc] Add docs for push-based shuffle in Datasets (#24486) Adds recommendations, example, and brief benchmark results for push-based shuffle in Datasets. * [Doc][Data] fix big-data-ingestion broken links (#24631) The links were broken. Fixed it. * [docs] Fix import error in Ray Data "getting started" (#24424) We did `import pandas as pd` but here we are using it as `pandas` * [Datasets] Overhaul of "Creating Datasets" feature guide. (#24831) This PR is a general overhaul of the "Creating Datasets" feature guide, providing complete coverage of all (public) dataset creation APIs and highlighting features and quirks of the individual APIs, data modalities, storage backends, etc. In order to keep the page from getting too long and keeping it easy to navigate, tabbed views are used heavily. * [Datasets] Add basic data ecosystem overview, user guide links, other data processing options card. (#23346) * Revamp the Getting Started page for Dataset (#24860) This is part of the Dataset GA doc fix effort to update/improve the documentation. This PR revamps the Getting Started page. What are the changes: - Focus on basic/core features that are bread-and-butter for users, leave the advanced features out - Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API) - Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case) - Use the same data example throughout to make it context-switch free - Use runnable code rather than faked - Reference to the code from doc, instead of inlining them in the doc Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Eric Liang <[email protected]> * [Datasets] Miscellaneous GA docs P0s. (#24891) This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely: - Documents Datasets resource allocation model. - De-emphasizes global/windowed shuffling. - Documents lazy execution mode, and expands our execution model docs in general. * [docs] After careful consideration, choose the lesser of two evils and set white-space: pre-wrap #24873 * [Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#24812) This PR makes several improvements to the Datasets' tensor story. See the issues for each item for more details. - Automatically infer tensor blocks (single-column tables representing a single tensor) when returning NumPy ndarrays from map_batches(), map(), and flat_map(). - Automatically infer tensor columns when building tabular blocks in general. - Fixes shuffling and sorting for tensor columns This should improve the UX/efficiency of the following: - Working with pure-tensor datasets in general. - Mapping tensor UDFs over pure-tensor, a better foundation for tensor-native preprocessing for end-users and AIR. * [Datasets] Overhaul "Accessing Datasets" feature guide. (#24963) This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()). * [Datasets] Add FAQ to Datasets docs. (#24932) This PR adds a FAQ to Datasets docs. Docs preview: https://ray--24932.org.readthedocs.build/en/24932/ - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Eric Liang <[email protected]> * [Datasets] Add basic e2e Datasets example on NYC taxi dataset (#24874) This PR adds a dedicated docs page for examples, and adds a basic e2e tabular data processing example on the NYC taxi dataset. The goal of this example is to demonstrate basic data reading, inspection, transformations, and shuffling, along with ingestion into dummy model trainers and doing dummy batch inference, for tabular (Parquet) data. * Revamp the Datasets API docstrings (#24949) * Revamp the Saving Datasets user guide (#24987) * Fix AIR references in Datasets FAQ. * [Datasets] Skip flaky pipelining memory release test (#25009) This pipelining memory release test is flaky; it was skipped in this Polars PR, which was then reverted. * Note that explicit resource allocation is experimental, fix typos (#25038) * fix the notebook test failure * no-op indent fix * fix notebooks test #2 * Revamp the Transforming Datasets user guide (#25033) * Fix range_arrow(), which is replaced by range_table() (#25036) * indent * allow empty * Proofread the some datasets docs (#25068) Co-authored-by: Ubuntu <[email protected]> * [Data] Add partitioning classes to Data API reference (#24203) Co-authored-by: Antoni Baum <[email protected]> Co-authored-by: Jian Xiao <[email protected]> Co-authored-by: Eric Liang <[email protected]> Co-authored-by: Robert <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]> Co-authored-by: Stephanie Wang <[email protected]> Co-authored-by: Chen Shen <[email protected]> Co-authored-by: Zhe Zhang <[email protected]> Co-authored-by: Ubuntu <[email protected]>

clarkzinzow requested review from ericl, scv119 and jjyao as code owners May 12, 2022 23:38

clarkzinzow assigned ericl, amogkam, Yard1 and jianoaix May 12, 2022

clarkzinzow changed the title ~~[Datsets] Support tensor columns in to_tf and to_torch.~~ [Datasets] Support tensor columns in to_tf and to_torch. May 12, 2022

amogkam assigned bveeramani May 12, 2022

amogkam reviewed May 13, 2022

View reviewed changes

ericl reviewed May 13, 2022

View reviewed changes

amogkam reviewed May 13, 2022

View reviewed changes

clarkzinzow force-pushed the datasets/feat/ml-ingest-tensor-column branch from a8d13e6 to 550b35d Compare May 13, 2022 00:43

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 13, 2022

clarkzinzow force-pushed the datasets/feat/ml-ingest-tensor-column branch from 550b35d to 209095c Compare May 13, 2022 21:56

ericl reviewed May 13, 2022

View reviewed changes

ericl approved these changes May 13, 2022

View reviewed changes

clarkzinzow added 2 commits May 14, 2022 21:56

Support tensor columns in to_tf and to_torch.

52504d5

PR feedback: comment typo, add back labels check.

b37e887

clarkzinzow force-pushed the datasets/feat/ml-ingest-tensor-column branch from 209095c to 4c2e975 Compare May 14, 2022 21:57

Add to_torch()/to_tf() example to docs.

baa3608

clarkzinzow force-pushed the datasets/feat/ml-ingest-tensor-column branch from 4c2e975 to baa3608 Compare May 15, 2022 01:48

clarkzinzow merged commit ea635ae into ray-project:master May 17, 2022

bveeramani mentioned this pull request May 18, 2022

[AIR] Support array output from Torch model #24902

Merged

6 tasks

clarkzinzow mentioned this pull request Jun 7, 2022

[Datasets] Find workaround for ingesting tensor columns into TensorFlow. #18315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datasets] Support tensor columns in `to_tf` and `to_torch`. #24752

[Datasets] Support tensor columns in `to_tf` and `to_torch`. #24752

clarkzinzow commented May 12, 2022 •

edited

Loading

amogkam left a comment

amogkam May 13, 2022

clarkzinzow May 13, 2022

ericl May 13, 2022

clarkzinzow May 13, 2022 •

edited

Loading

ericl May 13, 2022

clarkzinzow May 13, 2022 •

edited

Loading

ericl May 13, 2022

clarkzinzow May 13, 2022

ericl May 13, 2022

amogkam May 13, 2022

clarkzinzow May 13, 2022

ericl May 13, 2022

ericl left a comment

richardliaw commented May 13, 2022

clarkzinzow commented May 14, 2022 •

edited

Loading

richardliaw commented May 14, 2022

richardliaw commented May 14, 2022 •

edited

Loading

clarkzinzow commented May 14, 2022 •

edited

Loading

clarkzinzow commented May 14, 2022

clarkzinzow commented May 16, 2022

[Datasets] Support tensor columns in to_tf and to_torch. #24752

[Datasets] Support tensor columns in to_tf and to_torch. #24752

Conversation

clarkzinzow commented May 12, 2022 • edited Loading

Related issue number

Checks

amogkam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarkzinzow May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarkzinzow May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

richardliaw commented May 13, 2022

clarkzinzow commented May 14, 2022 • edited Loading

richardliaw commented May 14, 2022

richardliaw commented May 14, 2022 • edited Loading

clarkzinzow commented May 14, 2022 • edited Loading

clarkzinzow commented May 14, 2022

clarkzinzow commented May 16, 2022

[Datasets] Support tensor columns in `to_tf` and `to_torch`. #24752

[Datasets] Support tensor columns in `to_tf` and `to_torch`. #24752

clarkzinzow commented May 12, 2022 •

edited

Loading

clarkzinzow May 13, 2022 •

edited

Loading

clarkzinzow May 13, 2022 •

edited

Loading

clarkzinzow commented May 14, 2022 •

edited

Loading

richardliaw commented May 14, 2022 •

edited

Loading

clarkzinzow commented May 14, 2022 •

edited

Loading